Dataset statistics
| Number of variables | 21 |
|---|---|
| Number of observations | 2260686 |
| Missing cells | 193160 |
| Missing cells (%) | 0.4% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 362.2 MiB |
| Average record size in memory | 168.0 B |
Variable types
| CAT | 11 |
|---|---|
| NUM | 10 |
order_created_at has a high cardinality: 2044439 distinct values | High cardinality |
order_completed_at has a high cardinality: 2025311 distinct values | High cardinality |
shipment_starts_at has a high cardinality: 7221 distinct values | High cardinality |
s.city_name has a high cardinality: 97 distinct values | High cardinality |
shipped_at has a high cardinality: 1864234 distinct values | High cardinality |
shipment_id is highly correlated with ship_address_id and 1 other fields | High correlation |
ship_address_id is highly correlated with shipment_id and 1 other fields | High correlation |
order_id is highly correlated with ship_address_id and 1 other fields | High correlation |
os is highly correlated with platform | High correlation |
platform is highly correlated with os | High correlation |
shipped_at has 185016 (8.2%) missing values | Missing |
promo_total is highly skewed (γ1 = -21.09994488) | Skewed |
order_created_at is uniformly distributed | Uniform |
order_completed_at is uniformly distributed | Uniform |
shipped_at is uniformly distributed | Uniform |
shipment_id has unique values | Unique |
total_cost has 494018 (21.9%) zeros | Zeros |
rate has 856368 (37.9%) zeros | Zeros |
promo_total has 1630341 (72.1%) zeros | Zeros |
Reproduction
| Analysis started | 2020-10-17 15:40:13.290907 |
|---|---|
| Analysis finished | 2020-10-17 15:46:40.443974 |
| Duration | 6 minutes and 27.15 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
df_index
Real number (ℝ≥0)
| Distinct | 872758 |
|---|---|
| Distinct (%) | 38.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 366535.6774 |
|---|---|
| Minimum | 0 |
| Maximum | 872757 |
| Zeros | 4 |
| Zeros (%) | < 0.1% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 28258 |
| Q1 | 141292.25 |
| median | 329598.5 |
| Q3 | 576502 |
| 95-th percentile | 802570.75 |
| Maximum | 872757 |
| Range | 872757 |
| Interquartile range (IQR) | 435209.75 |
Descriptive statistics
| Standard deviation | 250381.3771 |
|---|---|
| Coefficient of variation (CV) | 0.6831023351 |
| Kurtosis | -1.102031449 |
| Mean | 366535.6774 |
| Median Absolute Deviation (MAD) | 207838.5 |
| Skewness | 0.3425850119 |
| Sum | 8.286220745e+11 |
| Variance | 6.269083401e+10 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 2047 | 4 | < 0.1% | |
| 109728 | 4 | < 0.1% | |
| 117932 | 4 | < 0.1% | |
| 128171 | 4 | < 0.1% | |
| 130218 | 4 | < 0.1% | |
| 124073 | 4 | < 0.1% | |
| 126120 | 4 | < 0.1% | |
| 103591 | 4 | < 0.1% | |
| 105638 | 4 | < 0.1% | |
| 99493 | 4 | < 0.1% | |
| Other values (872748) | 2260646 | > 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 4 | < 0.1% | |
| 1 | 4 | < 0.1% | |
| 2 | 4 | < 0.1% | |
| 3 | 4 | < 0.1% | |
| 4 | 4 | < 0.1% |
| Value | Count | Frequency (%) | |
| 872757 | 1 | < 0.1% | |
| 872756 | 1 | < 0.1% | |
| 872755 | 1 | < 0.1% | |
| 872754 | 1 | < 0.1% | |
| 872753 | 1 | < 0.1% |
user_id
Real number (ℝ≥0)
| Distinct | 654907 |
|---|---|
| Distinct (%) | 29.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1419608.138 |
|---|---|
| Minimum | 1400 |
| Maximum | 2925501 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | 1400 |
|---|---|
| 5-th percentile | 200558 |
| Q1 | 889145 |
| median | 1466336 |
| Q3 | 1938162 |
| 95-th percentile | 2574073.5 |
| Maximum | 2925501 |
| Range | 2924101 |
| Interquartile range (IQR) | 1049017 |
Descriptive statistics
| Standard deviation | 716489.8138 |
|---|---|
| Coefficient of variation (CV) | 0.5047095705 |
| Kurtosis | -0.7707003322 |
| Mean | 1419608.138 |
| Median Absolute Deviation (MAD) | 517211 |
| Skewness | -0.09404034812 |
| Sum | 3.209288244e+12 |
| Variance | 5.133576533e+11 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 57430 | 395 | < 0.1% | |
| 346285 | 214 | < 0.1% | |
| 959397 | 194 | < 0.1% | |
| 226064 | 184 | < 0.1% | |
| 301738 | 164 | < 0.1% | |
| 1421169 | 154 | < 0.1% | |
| 621722 | 134 | < 0.1% | |
| 353314 | 132 | < 0.1% | |
| 145608 | 131 | < 0.1% | |
| 1094835 | 129 | < 0.1% | |
| Other values (654897) | 2258855 | 99.9% |
| Value | Count | Frequency (%) | |
| 1400 | 2 | < 0.1% | |
| 1459 | 34 | < 0.1% | |
| 1540 | 1 | < 0.1% | |
| 1541 | 1 | < 0.1% | |
| 1577 | 4 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2925501 | 1 | < 0.1% | |
| 2925487 | 1 | < 0.1% | |
| 2925486 | 1 | < 0.1% | |
| 2925484 | 1 | < 0.1% | |
| 2925480 | 1 | < 0.1% |
| Distinct | 2226096 |
|---|---|
| Distinct (%) | 98.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7795237.668 |
|---|---|
| Minimum | 8531 |
| Maximum | 12540602 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | 8531 |
|---|---|
| 5-th percentile | 3093373.75 |
| Q1 | 5786986.75 |
| median | 8025621.5 |
| Q3 | 10024670.75 |
| 95-th percentile | 11799473.5 |
| Maximum | 12540602 |
| Range | 12532071 |
| Interquartile range (IQR) | 4237684 |
Descriptive statistics
| Standard deviation | 2716492.13 |
|---|---|
| Coefficient of variation (CV) | 0.3484809888 |
| Kurtosis | -0.8818737825 |
| Mean | 7795237.668 |
| Median Absolute Deviation (MAD) | 2098333.5 |
| Skewness | -0.2786823699 |
| Sum | 1.762258466e+13 |
| Variance | 7.379329495e+12 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 5998444 | 4 | < 0.1% | |
| 6589785 | 4 | < 0.1% | |
| 8511299 | 4 | < 0.1% | |
| 9950778 | 4 | < 0.1% | |
| 6170386 | 4 | < 0.1% | |
| 8687488 | 4 | < 0.1% | |
| 3965544 | 4 | < 0.1% | |
| 10282551 | 4 | < 0.1% | |
| 6646354 | 4 | < 0.1% | |
| 9938600 | 4 | < 0.1% | |
| Other values (2226086) | 2260646 | > 99.9% |
| Value | Count | Frequency (%) | |
| 8531 | 1 | < 0.1% | |
| 17410 | 1 | < 0.1% | |
| 23364 | 1 | < 0.1% | |
| 29576 | 1 | < 0.1% | |
| 45029 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 12540602 | 1 | < 0.1% | |
| 12540591 | 1 | < 0.1% | |
| 12540588 | 1 | < 0.1% | |
| 12540558 | 1 | < 0.1% | |
| 12540435 | 1 | < 0.1% |
| Distinct | 2260686 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6394665.419 |
|---|---|
| Minimum | 178163 |
| Maximum | 9916560 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | 178163 |
|---|---|
| 5-th percentile | 2425272 |
| Q1 | 4831396.25 |
| median | 6588654 |
| Q3 | 8215820.75 |
| 95-th percentile | 9572736.75 |
| Maximum | 9916560 |
| Range | 9738397 |
| Interquartile range (IQR) | 3384424.5 |
Descriptive statistics
| Standard deviation | 2193967.791 |
|---|---|
| Coefficient of variation (CV) | 0.3430934455 |
| Kurtosis | -0.8942715243 |
| Mean | 6394665.419 |
| Median Absolute Deviation (MAD) | 1685556.5 |
| Skewness | -0.3156400708 |
| Sum | 1.445633059e+13 |
| Variance | 4.813494669e+12 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 8979498 | 1 | < 0.1% | |
| 9480344 | 1 | < 0.1% | |
| 5994948 | 1 | < 0.1% | |
| 5982658 | 1 | < 0.1% | |
| 5810622 | 1 | < 0.1% | |
| 5812669 | 1 | < 0.1% | |
| 5802426 | 1 | < 0.1% | |
| 5806520 | 1 | < 0.1% | |
| 5826998 | 1 | < 0.1% | |
| 8585144 | 1 | < 0.1% | |
| Other values (2260676) | 2260676 | > 99.9% |
| Value | Count | Frequency (%) | |
| 178163 | 1 | < 0.1% | |
| 273988 | 1 | < 0.1% | |
| 322307 | 1 | < 0.1% | |
| 337809 | 1 | < 0.1% | |
| 351762 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 9916560 | 1 | < 0.1% | |
| 9916540 | 1 | < 0.1% | |
| 9916532 | 1 | < 0.1% | |
| 9916519 | 1 | < 0.1% | |
| 9916517 | 1 | < 0.1% |
| Distinct | 2226106 |
|---|---|
| Distinct (%) | 98.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10800948.79 |
|---|---|
| Minimum | 3217 |
| Maximum | 15908203 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | 3217 |
|---|---|
| 5-th percentile | 5935478.75 |
| Q1 | 8612992.5 |
| median | 10964336 |
| Q3 | 13194186 |
| 95-th percentile | 15099057.75 |
| Maximum | 15908203 |
| Range | 15904986 |
| Interquartile range (IQR) | 4581193.5 |
Descriptive statistics
| Standard deviation | 2872840.54 |
|---|---|
| Coefficient of variation (CV) | 0.2659803871 |
| Kurtosis | -0.9276173959 |
| Mean | 10800948.79 |
| Median Absolute Deviation (MAD) | 2281872.5 |
| Skewness | -0.2131980498 |
| Sum | 2.441755371e+13 |
| Variance | 8.253212767e+12 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 14809733 | 4 | < 0.1% | |
| 11710682 | 4 | < 0.1% | |
| 9482413 | 4 | < 0.1% | |
| 6866342 | 4 | < 0.1% | |
| 13986924 | 4 | < 0.1% | |
| 9423259 | 4 | < 0.1% | |
| 9965556 | 4 | < 0.1% | |
| 15098375 | 4 | < 0.1% | |
| 9670303 | 4 | < 0.1% | |
| 7016582 | 4 | < 0.1% | |
| Other values (2226096) | 2260646 | > 99.9% |
| Value | Count | Frequency (%) | |
| 3217 | 1 | < 0.1% | |
| 139128 | 1 | < 0.1% | |
| 139865 | 1 | < 0.1% | |
| 141189 | 1 | < 0.1% | |
| 141736 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 15908203 | 1 | < 0.1% | |
| 15908189 | 1 | < 0.1% | |
| 15908183 | 1 | < 0.1% | |
| 15908146 | 1 | < 0.1% | |
| 15908012 | 1 | < 0.1% |
| Distinct | 2044439 |
|---|---|
| Distinct (%) | 90.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.2 MiB |
| 2020-03-04 14:24:13 | 7 |
|---|---|
| 2020-07-11 10:49:56 | 6 |
| 2020-08-07 12:57:09 | 6 |
| 2020-06-14 11:16:32 | 6 |
| 2020-07-09 07:04:24 | 6 |
| Other values (2044434) |
| Value | Count | Frequency (%) | |
| 2020-03-04 14:24:13 | 7 | < 0.1% | |
| 2020-07-11 10:49:56 | 6 | < 0.1% | |
| 2020-08-07 12:57:09 | 6 | < 0.1% | |
| 2020-06-14 11:16:32 | 6 | < 0.1% | |
| 2020-07-09 07:04:24 | 6 | < 0.1% | |
| 2020-05-07 15:01:22 | 6 | < 0.1% | |
| 2020-07-04 12:10:14 | 6 | < 0.1% | |
| 2020-06-06 12:41:49 | 6 | < 0.1% | |
| 2020-06-12 10:24:19 | 6 | < 0.1% | |
| 2020-07-11 11:59:33 | 6 | < 0.1% | |
| Other values (2044429) | 2260625 | > 99.9% |
Unique
| Unique | 1848483 ? |
|---|---|
| Unique (%) | 81.8% |
Length
| Max length | 19 |
|---|---|
| Median length | 19 |
| Mean length | 19 |
| Min length | 19 |
| Distinct | 2025311 |
|---|---|
| Distinct (%) | 89.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.2 MiB |
| 2020-07-25 07:43:28 | 9 |
|---|---|
| 2020-07-26 12:32:27 | 7 |
| 2020-07-19 08:32:37 | 7 |
| 2020-07-25 09:05:36 | 7 |
| 2020-07-11 12:17:00 | 6 |
| Other values (2025306) |
| Value | Count | Frequency (%) | |
| 2020-07-25 07:43:28 | 9 | < 0.1% | |
| 2020-07-26 12:32:27 | 7 | < 0.1% | |
| 2020-07-19 08:32:37 | 7 | < 0.1% | |
| 2020-07-25 09:05:36 | 7 | < 0.1% | |
| 2020-07-11 12:17:00 | 6 | < 0.1% | |
| 2020-07-29 07:46:46 | 6 | < 0.1% | |
| 2020-07-24 07:47:58 | 6 | < 0.1% | |
| 2020-07-14 09:56:36 | 6 | < 0.1% | |
| 2020-07-12 11:41:03 | 6 | < 0.1% | |
| 2020-06-28 08:54:07 | 6 | < 0.1% | |
| Other values (2025301) | 2260620 | > 99.9% |
Unique
| Unique | 1813436 ? |
|---|---|
| Unique (%) | 80.2% |
Length
| Max length | 19 |
|---|---|
| Median length | 19 |
| Mean length | 19 |
| Min length | 19 |
| Distinct | 7221 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.2 MiB |
| 2020-05-29 12:00:00 | 1706 |
|---|---|
| 2020-05-29 13:00:00 | 1704 |
| 2020-05-29 11:00:00 | 1700 |
| 2020-05-30 07:00:00 | 1699 |
| 2020-08-14 12:00:00 | 1698 |
| Other values (7216) |
| Value | Count | Frequency (%) | |
| 2020-05-29 12:00:00 | 1706 | 0.1% | |
| 2020-05-29 13:00:00 | 1704 | 0.1% | |
| 2020-05-29 11:00:00 | 1700 | 0.1% | |
| 2020-05-30 07:00:00 | 1699 | 0.1% | |
| 2020-08-14 12:00:00 | 1698 | 0.1% | |
| 2020-05-31 13:00:00 | 1676 | 0.1% | |
| 2020-05-29 07:00:00 | 1672 | 0.1% | |
| 2020-05-29 08:00:00 | 1654 | 0.1% | |
| 2020-05-31 07:00:00 | 1641 | 0.1% | |
| 2020-05-29 10:00:00 | 1632 | 0.1% | |
| Other values (7211) | 2243904 | 99.3% |
Unique
| Unique | 672 ? |
|---|---|
| Unique (%) | < 0.1% |
Length
| Max length | 19 |
|---|---|
| Median length | 19 |
| Mean length | 19 |
| Min length | 19 |
retailer
Categorical
| Distinct | 46 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.2 MiB |
| METRO | |
|---|---|
| Лента | |
| Ашан | |
| МЕГАМАРТ | 13945 |
| Азбука Вкуса | 6433 |
| Other values (41) | 31525 |
| Value | Count | Frequency (%) | |
| METRO | 1305570 | 57.8% | |
| Лента | 540033 | 23.9% | |
| Ашан | 363180 | 16.1% | |
| МЕГАМАРТ | 13945 | 0.6% | |
| Азбука Вкуса | 6433 | 0.3% | |
| ВкусВилл | 6242 | 0.3% | |
| ВИКТОРИЯ | 3887 | 0.2% | |
| Командор | 3222 | 0.1% | |
| BILLA | 3153 | 0.1% | |
| Бахетле | 2525 | 0.1% | |
| Other values (36) | 12496 | 0.6% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 21 |
|---|---|
| Median length | 5 |
| Mean length | 4.913177681 |
| Min length | 3 |
s.order_state
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.2 MiB |
| complete | |
|---|---|
| canceled | 177848 |
| resumed | 3194 |
| cart | 28 |
| Value | Count | Frequency (%) | |
| complete | 2079616 | 92.0% | |
| canceled | 177848 | 7.9% | |
| resumed | 3194 | 0.1% | |
| cart | 28 | < 0.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 7.998537612 |
| Min length | 4 |
shipment_state
Categorical
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 8056 |
| Missing (%) | 0.4% |
| Memory size | 17.2 MiB |
| shipped | |
|---|---|
| canceled | 175076 |
| ready | 489 |
| collecting | 486 |
| ready_to_ship | 412 |
| Other values (2) | 301 |
| Value | Count | Frequency (%) | |
| shipped | 2075866 | 91.8% | |
| canceled | 175076 | 7.7% | |
| ready | 489 | < 0.1% | |
| collecting | 486 | < 0.1% | |
| ready_to_ship | 412 | < 0.1% | |
| shipping | 273 | < 0.1% | |
| pending | 28 | < 0.1% | |
| (Missing) | 8056 | 0.4% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 13 |
|---|---|
| Median length | 7 |
| Mean length | 7.064616227 |
| Min length | 3 |
| Distinct | 97 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.2 MiB |
| Москва | |
|---|---|
| Санкт-Петербург | |
| Краснодар | |
| Новосибирск | |
| Екатеринбург | 115493 |
| Other values (92) |
| Value | Count | Frequency (%) | |
| Москва | 539405 | 23.9% | |
| Санкт-Петербург | 147346 | 6.5% | |
| Краснодар | 124309 | 5.5% | |
| Новосибирск | 124021 | 5.5% | |
| Екатеринбург | 115493 | 5.1% | |
| Ростов-на-Дону | 112756 | 5.0% | |
| Московская Область | 106358 | 4.7% | |
| Самара | 91559 | 4.1% | |
| Тюмень | 67029 | 3.0% | |
| Красноярск | 62579 | 2.8% | |
| Other values (87) | 769831 | 34.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 24 |
|---|---|
| Median length | 7 |
| Mean length | 8.925242603 |
| Min length | 3 |
s.store_id
Real number (ℝ≥0)
| Distinct | 599 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 142.9528276 |
|---|---|
| Minimum | 1 |
| Maximum | 726 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 8 |
| Q1 | 86 |
| median | 129 |
| Q3 | 188 |
| 95-th percentile | 327 |
| Maximum | 726 |
| Range | 725 |
| Interquartile range (IQR) | 102 |
Descriptive statistics
| Standard deviation | 96.92464657 |
|---|---|
| Coefficient of variation (CV) | 0.6780183939 |
| Kurtosis | 2.785670167 |
| Mean | 142.9528276 |
| Median Absolute Deviation (MAD) | 48 |
| Skewness | 1.095197914 |
| Sum | 323171456 |
| Variance | 9394.387113 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 14 | 46730 | 2.1% | |
| 2 | 41599 | 1.8% | |
| 1 | 38400 | 1.7% | |
| 8 | 38374 | 1.7% | |
| 105 | 36954 | 1.6% | |
| 10 | 36789 | 1.6% | |
| 83 | 32004 | 1.4% | |
| 65 | 31698 | 1.4% | |
| 110 | 31189 | 1.4% | |
| 11 | 30088 | 1.3% | |
| Other values (589) | 1896861 | 83.9% |
| Value | Count | Frequency (%) | |
| 1 | 38400 | 1.7% | |
| 2 | 41599 | 1.8% | |
| 3 | 25250 | 1.1% | |
| 4 | 281 | < 0.1% | |
| 8 | 38374 | 1.7% |
| Value | Count | Frequency (%) | |
| 726 | 42 | < 0.1% | |
| 724 | 80 | < 0.1% | |
| 723 | 1 | < 0.1% | |
| 721 | 11 | < 0.1% | |
| 720 | 17 | < 0.1% |
| Distinct | 1074 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 114.5803856 |
|---|---|
| Minimum | 0 |
| Maximum | 12509 |
| Zeros | 494018 |
| Zeros (%) | 21.9% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 4 |
| median | 158 |
| Q3 | 158 |
| 95-th percentile | 199 |
| Maximum | 12509 |
| Range | 12509 |
| Interquartile range (IQR) | 154 |
Descriptive statistics
| Standard deviation | 83.20589112 |
|---|---|
| Coefficient of variation (CV) | 0.7261791857 |
| Kurtosis | 287.526762 |
| Mean | 114.5803856 |
| Median Absolute Deviation (MAD) | 58 |
| Skewness | 3.472797069 |
| Sum | 259030273.6 |
| Variance | 6923.220316 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 158 | 783787 | 34.7% | |
| 0 | 494018 | 21.9% | |
| 98 | 341586 | 15.1% | |
| 199 | 193851 | 8.6% | |
| 99 | 118974 | 5.3% | |
| 1 | 71012 | 3.1% | |
| 178 | 8829 | 0.4% | |
| 163 | 6596 | 0.3% | |
| 162 | 6107 | 0.3% | |
| 198 | 6008 | 0.3% | |
| Other values (1064) | 229918 | 10.2% |
| Value | Count | Frequency (%) | |
| 0 | 494018 | 21.9% | |
| 1 | 71012 | 3.1% | |
| 2.5 | 1 | < 0.1% | |
| 4 | 718 | < 0.1% | |
| 5 | 664 | < 0.1% |
| Value | Count | Frequency (%) | |
| 12509 | 1 | < 0.1% | |
| 6073 | 1 | < 0.1% | |
| 5483 | 1 | < 0.1% | |
| 5018 | 1 | < 0.1% | |
| 4968 | 1 | < 0.1% |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.9771733 |
|---|---|
| Minimum | 0 |
| Maximum | 5 |
| Zeros | 856368 |
| Zeros (%) | 37.9% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 5 |
| Q3 | 5 |
| 95-th percentile | 5 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 2.381824985 |
|---|---|
| Coefficient of variation (CV) | 0.8000290022 |
| Kurtosis | -1.786121872 |
| Mean | 2.9771733 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -0.407830903 |
| Sum | 6730454 |
| Variance | 5.673090258 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 5 | 1231176 | 54.5% | |
| 0 | 856368 | 37.9% | |
| 4 | 104867 | 4.6% | |
| 3 | 38379 | 1.7% | |
| 1 | 19823 | 0.9% | |
| 2 | 10073 | 0.4% |
| Value | Count | Frequency (%) | |
| 0 | 856368 | 37.9% | |
| 1 | 19823 | 0.9% | |
| 2 | 10073 | 0.4% | |
| 3 | 38379 | 1.7% | |
| 4 | 104867 | 4.6% |
| Value | Count | Frequency (%) | |
| 5 | 1231176 | 54.5% | |
| 4 | 104867 | 4.6% | |
| 3 | 38379 | 1.7% | |
| 2 | 10073 | 0.4% | |
| 1 | 19823 | 0.9% |
dw_kind
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 17.2 MiB |
| courier | |
|---|---|
| pickup | 6258 |
| express_delivery | 1941 |
| Value | Count | Frequency (%) | |
| courier | 2252487 | 99.6% | |
| pickup | 6258 | 0.3% | |
| express_delivery | 1941 | 0.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 16 |
|---|---|
| Median length | 7 |
| Mean length | 7.004959114 |
| Min length | 6 |
| Distinct | 35671 |
|---|---|
| Distinct (%) | 1.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -80.6346039 |
|---|---|
| Minimum | -25000 |
| Maximum | 0 |
| Zeros | 1630341 |
| Zeros (%) | 72.1% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | -25000 |
|---|---|
| 5-th percentile | -300 |
| Q1 | -150 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0 |
| Maximum | 0 |
| Range | 25000 |
| Interquartile range (IQR) | 150 |
Descriptive statistics
| Standard deviation | 219.3843216 |
|---|---|
| Coefficient of variation (CV) | -2.720721762 |
| Kurtosis | 1393.345312 |
| Mean | -80.6346039 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -21.09994488 |
| Sum | -182289520.1 |
| Variance | 48129.48058 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 0 | 1630341 | 72.1% | |
| -250 | 186405 | 8.2% | |
| -200 | 142537 | 6.3% | |
| -300 | 79150 | 3.5% | |
| -199 | 34676 | 1.5% | |
| -150 | 29138 | 1.3% | |
| -500 | 24894 | 1.1% | |
| -100 | 21930 | 1.0% | |
| -350 | 9446 | 0.4% | |
| -400 | 6469 | 0.3% | |
| Other values (35661) | 95700 | 4.2% |
| Value | Count | Frequency (%) | |
| -25000 | 6 | < 0.1% | |
| -24998 | 2 | < 0.1% | |
| -24822.80078 | 1 | < 0.1% | |
| -24647.86914 | 1 | < 0.1% | |
| -24442.44922 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 1630341 | 72.1% | |
| -0.5799999833 | 1 | < 0.1% | |
| -0.75 | 1 | < 0.1% | |
| -0.9700000286 | 1 | < 0.1% | |
| -0.9800000191 | 1 | < 0.1% |
total_weight
Real number (ℝ≥0)
| Distinct | 79893 |
|---|---|
| Distinct (%) | 3.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 16369.29792 |
|---|---|
| Minimum | 0 |
| Maximum | 2502000 |
| Zeros | 1873 |
| Zeros (%) | 0.1% |
| Memory size | 17.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2984 |
| Q1 | 7559 |
| median | 12760 |
| Q3 | 20810 |
| 95-th percentile | 40966 |
| Maximum | 2502000 |
| Range | 2502000 |
| Interquartile range (IQR) | 13251 |
Descriptive statistics
| Standard deviation | 14761.64606 |
|---|---|
| Coefficient of variation (CV) | 0.9017885879 |
| Kurtosis | 556.4665488 |
| Mean | 16369.29792 |
| Median Absolute Deviation (MAD) | 6091 |
| Skewness | 8.470433175 |
| Sum | 3.700584264e+10 |
| Variance | 217906194.4 |
| Monotocity | Not monotonic |
| Value | Count | Frequency (%) | |
| 4000 | 3322 | 0.1% | |
| 30000 | 3052 | 0.1% | |
| 3000 | 2835 | 0.1% | |
| 2000 | 2577 | 0.1% | |
| 6000 | 2394 | 0.1% | |
| 5000 | 2286 | 0.1% | |
| 7000 | 2184 | 0.1% | |
| 2800 | 1888 | 0.1% | |
| 0 | 1873 | 0.1% | |
| 200 | 1731 | 0.1% | |
| Other values (79883) | 2236544 | 98.9% |
| Value | Count | Frequency (%) | |
| 0 | 1873 | 0.1% | |
| 1 | 20 | < 0.1% | |
| 2 | 48 | < 0.1% | |
| 3 | 66 | < 0.1% | |
| 4 | 6 | < 0.1% |
| Value | Count | Frequency (%) | |
| 2502000 | 1 | < 0.1% | |
| 1185042 | 1 | < 0.1% | |
| 1149050 | 1 | < 0.1% | |
| 1136795 | 1 | < 0.1% | |
| 1107000 | 1 | < 0.1% |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 44 |
| Missing (%) | < 0.1% |
| Memory size | 17.2 MiB |
| app | |
|---|---|
| web |
| Value | Count | Frequency (%) | |
| app | 1763573 | 78.0% | |
| web | 497069 | 22.0% | |
| (Missing) | 44 | < 0.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 44 |
| Missing (%) | < 0.1% |
| Memory size | 17.2 MiB |
| ios | |
|---|---|
| android | |
| windows | |
| mac | 62767 |
| other | 28018 |
| Value | Count | Frequency (%) | |
| ios | 929440 | 41.1% | |
| android | 873241 | 38.6% | |
| windows | 358208 | 15.8% | |
| mac | 62767 | 2.8% | |
| other | 28018 | 1.2% | |
| linux | 8968 | 0.4% | |
| (Missing) | 44 | < 0.1% |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Length
| Max length | 7 |
|---|---|
| Median length | 7 |
| Mean length | 5.211615412 |
| Min length | 3 |
| Distinct | 1864234 |
|---|---|
| Distinct (%) | 89.8% |
| Missing | 185016 |
| Missing (%) | 8.2% |
| Memory size | 17.2 MiB |
| 2020-06-09 08:16:39 | 6 |
|---|---|
| 2020-05-29 09:31:02 | 6 |
| 2020-08-16 12:17:41 | 6 |
| 2020-05-23 10:10:59 | 6 |
| 2020-05-22 11:34:39 | 5 |
| Other values (1864229) |
| Value | Count | Frequency (%) | |
| 2020-06-09 08:16:39 | 6 | < 0.1% | |
| 2020-05-29 09:31:02 | 6 | < 0.1% | |
| 2020-08-16 12:17:41 | 6 | < 0.1% | |
| 2020-05-23 10:10:59 | 6 | < 0.1% | |
| 2020-05-22 11:34:39 | 5 | < 0.1% | |
| 2020-05-25 07:56:42 | 5 | < 0.1% | |
| 2020-05-22 16:18:15 | 5 | < 0.1% | |
| 2020-05-30 07:45:29 | 5 | < 0.1% | |
| 2020-07-05 09:57:37 | 5 | < 0.1% | |
| 2020-08-03 11:09:10 | 5 | < 0.1% | |
| Other values (1864224) | 2075616 | 91.8% | |
| (Missing) | 185016 | 8.2% |
Unique
| Unique | 1670737 ? |
|---|---|
| Unique (%) | 80.5% |
Length
| Max length | 19 |
|---|---|
| Median length | 19 |
| Mean length | 17.69054968 |
| Min length | 3 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| df_index | user_id | ship_address_id | shipment_id | order_id | order_created_at | order_completed_at | shipment_starts_at | retailer | s.order_state | shipment_state | s.city_name | s.store_id | total_cost | rate | dw_kind | promo_total | total_weight | platform | os | shipped_at | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 11019 | 171906 | 322307 | 2156687 | 2017-08-03 21:25:23 | 2020-02-18 14:07:00 | 2020-02-20 07:00:00 | METRO | complete | shipped | Москва | 21 | 168.0 | 0 | courier | 0.0 | 30170 | web | windows | 2020-02-20 08:08:54 |
| 1 | 1 | 62278 | 278832 | 387023 | 3021953 | 2018-03-02 17:22:04 | 2020-01-03 13:09:26 | 2020-01-03 17:00:00 | METRO | complete | shipped | Москва | 10 | 98.0 | 0 | courier | -150.0 | 11305 | web | windows | 2020-01-03 18:10:40 |
| 2 | 2 | 905126 | 468407 | 388943 | 3018198 | 2018-02-28 11:32:47 | 2020-02-12 12:39:28 | 2020-02-13 11:00:00 | METRO | complete | shipped | Москва | 21 | 98.0 | 5 | courier | 0.0 | 13589 | app | ios | 2020-02-13 12:33:53 |
| 3 | 3 | 21412 | 61962 | 421048 | 3030227 | 2018-03-07 20:37:27 | 2020-01-25 11:58:56 | 2020-01-25 18:00:00 | METRO | complete | shipped | Москва | 8 | 158.0 | 0 | courier | 0.0 | 9726 | web | mac | 2020-01-25 19:55:32 |
| 4 | 4 | 42110 | 378297 | 442659 | 2923996 | 2017-12-24 11:19:04 | 2020-01-07 14:30:44 | 2020-01-07 19:00:00 | METRO | complete | shipped | Москва | 2 | 163.0 | 0 | courier | 0.0 | 30323 | web | windows | 2020-01-07 19:51:37 |
| 5 | 5 | 550805 | 486427 | 502038 | 3234841 | 2018-06-21 13:14:20 | 2020-01-28 05:55:32 | 2020-01-28 09:00:00 | METRO | complete | shipped | Москва | 8 | 98.0 | 0 | courier | 0.0 | 10585 | app | ios | 2020-01-28 10:50:17 |
| 6 | 6 | 56809 | 345922 | 506181 | 3228747 | 2018-06-18 07:50:25 | 2020-01-27 16:10:33 | 2020-01-28 07:00:00 | METRO | complete | shipped | Москва | 3 | 258.0 | 0 | courier | 0.0 | 49598 | app | ios | 2020-01-28 07:43:45 |
| 7 | 7 | 71021 | 466579 | 510340 | 3217894 | 2018-06-13 09:48:40 | 2020-01-26 15:55:56 | 2020-01-26 19:00:00 | METRO | complete | shipped | Москва | 8 | 158.0 | 0 | courier | 0.0 | 23685 | app | android | 2020-01-26 19:57:48 |
| 8 | 8 | 66622 | 346422 | 521176 | 3084629 | 2018-04-11 13:31:29 | 2020-02-16 08:14:07 | 2020-02-16 15:00:00 | METRO | complete | shipped | Москва | 10 | 158.0 | 0 | courier | 0.0 | 21000 | app | ios | 2020-02-16 16:06:06 |
| 9 | 9 | 102583 | 595737 | 571826 | 3374586 | 2018-08-29 10:35:50 | 2020-01-03 14:46:11 | 2020-01-03 18:00:00 | METRO | complete | shipped | Москва | 2 | 158.0 | 0 | courier | -199.0 | 7960 | web | windows | 2020-01-03 19:16:50 |
Last rows
| df_index | user_id | ship_address_id | shipment_id | order_id | order_created_at | order_completed_at | shipment_starts_at | retailer | s.order_state | shipment_state | s.city_name | s.store_id | total_cost | rate | dw_kind | promo_total | total_weight | platform | os | shipped_at | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2260676 | 872748 | 1482744 | 12301634 | 9916468 | 15657405 | 2020-08-27 08:33:50 | 2020-08-31 23:50:14 | 2020-09-01 10:30:00 | ВкусВилл | complete | NaN | Москва | 533 | 0.0 | 0 | express_delivery | 0.0 | 1965 | app | ios | NaN |
| 2260677 | 872749 | 2845749 | 12168752 | 9916493 | 15511307 | 2020-08-24 09:36:50 | 2020-08-31 23:45:52 | 2020-09-01 06:00:00 | METRO | complete | NaN | Кемерово | 182 | 158.0 | 0 | courier | 0.0 | 12266 | app | android | NaN |
| 2260678 | 872750 | 275725 | 12540558 | 9916504 | 15908146 | 2020-08-31 23:38:12 | 2020-08-31 23:43:45 | 2020-09-01 09:00:00 | METRO | canceled | NaN | Казань | 62 | 158.0 | 0 | courier | -300.0 | 2450 | app | ios | NaN |
| 2260679 | 872751 | 1813646 | 11858902 | 9916509 | 15167417 | 2020-08-17 21:46:33 | 2020-08-31 23:54:18 | 2020-09-03 10:00:00 | METRO | complete | NaN | Москва | 10 | 116.0 | 0 | courier | 0.0 | 52320 | app | android | NaN |
| 2260680 | 872752 | 874653 | 11893429 | 9916515 | 15206448 | 2020-08-18 14:14:37 | 2020-08-31 23:56:44 | 2020-09-01 05:00:00 | METRO | complete | NaN | Красноярск | 119 | 158.0 | 0 | courier | 0.0 | 1915 | app | ios | NaN |
| 2260681 | 872753 | 274733 | 12102714 | 9916517 | 15438866 | 2020-08-22 20:34:05 | 2020-08-31 23:56:47 | 2020-09-01 12:00:00 | Лента | complete | NaN | Санкт-Петербург | 200 | 199.0 | 0 | courier | -250.0 | 3990 | web | windows | NaN |
| 2260682 | 872754 | 275725 | 12540588 | 9916519 | 15908183 | 2020-08-31 23:43:47 | 2020-08-31 23:45:41 | 2020-09-01 09:00:00 | METRO | canceled | NaN | Казань | 62 | 158.0 | 0 | courier | -300.0 | 12390 | app | ios | NaN |
| 2260683 | 872755 | 275725 | 12540591 | 9916532 | 15908189 | 2020-08-31 23:45:44 | 2020-08-31 23:49:03 | 2020-09-01 09:00:00 | METRO | canceled | NaN | Казань | 62 | 158.0 | 0 | courier | -300.0 | 12110 | app | ios | NaN |
| 2260684 | 872756 | 275725 | 12540602 | 9916540 | 15908203 | 2020-08-31 23:49:06 | 2020-08-31 23:50:57 | 2020-09-01 09:00:00 | METRO | canceled | NaN | Казань | 62 | 158.0 | 0 | courier | -300.0 | 12390 | app | ios | NaN |
| 2260685 | 872757 | 2919887 | 12519984 | 9916560 | 15887851 | 2020-08-31 13:30:49 | 2020-08-31 23:58:22 | 2020-09-01 08:00:00 | METRO | complete | NaN | Санкт-Петербург | 83 | 0.0 | 0 | courier | -300.0 | 264 | web | windows | NaN |